IEICE global.ieice.org Site

Author Search Result

[Author] Hiroaki KUNIEDA(45hit)

21-40hit(45hit)

Two Dimensional Space Partition Recursive Filtering Algorithm on Rectangular Processor Array
Yoshinori TAKEUCHI Hiroaki KUNIEDA

PAPER-Digital Signal Processing

Vol:
E74-A No:1
Page(s):
42-48
This paper studies the method of parallel processing for two dimensional recursive filters on a multiprocessor system. Conventional recursive filterings are sequential and iterative local, i.e. global processing. We decompose their global processings into space partition processings with a few global communications. We derive an efficient parallel algorithm for two dimensional recursive filterings using Roesser's model and investigate their speed-up, rate, efficiency and degradation.
Memory Sharing Processor Array (MSPA) Architecture
Dongju LI Hiroaki KUNIEDA

PAPER

Vol:
E79-A No:12
Page(s):
2086-2096
In this paper, a design of a new processor array architecture with effective data storage schemes which meets the practical requirement of a reduced number of processor elements is proposed. Its design method is shown to be drastically simpler than the popular systolic arrays. This processor array which we call Memory Sharing Processor Array (MSPA) consists of a processor array, several memory units, and some address generation hardware units used to minimize the number of I/O ports. MSPA architecture with its design methodology tries to overcome overlapping data storages, idle processing time and I/O bottleneck problems, which mostly degrade the performance of systolic architecture. It has practical advantages over the systolic array in the view of area-efficiency, high throughput and practical input schemes.
Design Optimization of VLSI Array Processor Architecture for Window Image Processing
Dongju LI Li JIANG Hiroaki KUNIEDA

PAPER

Vol:
E82-A No:8
Page(s):
1475-1484
In this paper, we present a novel architecture named as Window-MSPA architecture which targets to window operations in image processing. We have previously developed a Memory Sharing Processor Array (MSPA) for fast array processing with regular iterative algorithms. Window-MSPA tries to optimize the data I/O ports and the number of processing elements so as to reduce hardware cost. The input scheme of image data is restricted to row by row input which simplifies the I/O architecture. Under this practical I/O restriction, the fastest processings are achieved. In this paper, we present the general Window-MSPA design methodology for wide variety of applications. As an practical application, we have already reported the design of MP@HL MPEG2 Motion Estimator LSI. Design formulas for Window-MSPA architecture are given for various size of window operations in image processing. Thus, the derived architecture is flexible enough to satisfy user's requirement for either area or speed.
Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2
Li JIANG Kazuhito ITO Hiroaki KUNIEDA

PAPER

Vol:
E80-A No:8
Page(s):
1438-1445
In this paper, a new bits truncation adaptive pyramid (BTAP) algorithm for motion estimation is presented. The method employs bits truncation of the gray level from 8bits to much less bits in the searching algorithm. Compared with conventional fast block matching algorithms, this method drastically improves speed for motion estimation of reduced gray-level images and preserves reasonable performance and algorithm reliability. Bits truncation concept is well combined with hierarchical pyramid algorithm in order to truncate adaptively according to image characteristics. The computation complexity is much less than that of pyramid algorithm and 3-Step motion estimation algorithm because of bit-truncated searbh and low overhead adaptation. Nevertheless, the PSNR property is also comparable with these two algorithms for various video sequences.
A New FPGA Architecture for High Performance Bit-Serial Pipeline Datapath
Akihisa OHTA Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-VLSI Design Technology and CAD

Vol:
E83-A No:8
Page(s):
1663-1672
In this paper, we present our work on the design of a new FPGA architecture targeted for high-performance bit-serial pipeline datapath. Bit-parallel systems require large amount of routing resource which is especially critical in using FPGAs. Their device utilization and operation frequency become low because of large routing penalty. Whereas bit-serial circuits are very efficient in routing, therefore are able to achieve a very high logic utilization. Our proposed FPGA architecture is designed taking into account the structure of bit-serial circuits to optimize the logic and routing architecture. Our FPGA guarantees near 100% logic utilization with a straightforward place and route tool due to high routability of bit-serial circuits and simple routing interconnect architecture. The FPGA chip core which we designed consists of around 200k transistors on 3.5 mm square substrate using 0.5 µm 2-metal CMOS process technology.
Narrow Fingerprint Template Synthesis by Clustering Minutiae Descriptors
Zhiqiang HU Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-Pattern Recognition

Pubricized:
2017/03/08
Vol:
E100-D No:6
Page(s):
1290-1302
Narrow swipe sensor has been widely used in embedded systems such as smart-phone. However, the size of captured image is much smaller than that obtained by the traditional area sensor. Therefore, the limited template coverage is the performance bottleneck of such kind of systems. Aiming to increase the geometry coverage of templates, a novel fingerprint template feature synthesis scheme is proposed in the present study. This method could synthesis multiple input fingerprints into a wider template by clustering the minutiae descriptors. The proposed method consists of two modules. Firstly, a user behavior-based Registration Pattern Inspection (RPI) algorithm is proposed to select the qualified candidates. Secondly, an iterative clustering algorithm Modified Fuzzy C-Means (MFCM) is proposed to process the large amount of minutiae descriptors and then generate the final template. Experiments conducted over swipe fingerprint database validate that this innovative method gives rise to significant improvements in reducing FRR (False Reject Rate) and EER (Equal Error Rate).
Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution
Surachai THONGKAEW Tsuyoshi ISSHIKI Dongju LI Hiroaki KUNIEDA

PAPER-High-Level Synthesis and System-Level Design

Vol:
E98-A No:12
Page(s):
2505-2518
The Process Virtual Machine (VM) is typical software that runs applications inside operating systems. Its purpose is to provide a platform-independent programming environment that abstracts away details of the underlying hardware, operating system and allows bytecodes (portable code) to be executed in the same way on any other platforms. The Process VMs are implemented using an interpreter to interpret bytecode instead of direct execution of host machine codes. Thus, the bytecode execution is slower than those of the compiled programming language execution. Several techniques including our previous paper, the “Fetch/Decode Hardware Extension”, have been proposed to speed up the interpretation of Process VMs. In this paper, we propose an additional methodology, the “Hardware Extension with Hybrid Execution” to further enhance the performance of Process VMs interpretation and focus on Register-based model. This new technique provides an additional decoder which can classify bytecodes into either simple or complex instructions. With “Hybrid Execution”, the simple instruction will be directly executed on hardware of native processor. The complex instruction will be emulated by the “extra optimized bytecode software handler” of native processor. In order to eliminate the overheads of retrieving and storing operand on memory, we utilize the physical registers instead of (low address) virtual registers. Moreover, the combination of 3 techniques: Delay scheduling, Mode predictor HW and Branch/goto controller can eliminate all of the switching mode overheads between native mode and bytecode mode. The experimental results show the improvements of execution speed on the Arithmetic instructions, loop & conditional instructions and method invocation & return instructions can be achieved up to 16.9x, 16.1x and 3.1x respectively. The approximate size of the proposed hardware extension is 0.04mm2 (or equivalent to 14.81k gates) and consumes an additional power of only 0.24mW. The stated results are obtained from logic synthesis using the TSMC 90nm technology @ 200MHz.
Fast Fingerprint Classification Based on Direction Pattern
Jinqing QI Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-Image/Visual Signal Processing

Vol:
E87-A No:8
Page(s):
1887-1892
A new and fast fingerprint classification method based on direction patterns is presented in this paper. This method is developed to be applicable to today's embedded fingerprint authentication system, in which small area sensors are widely used. Direction patterns are well treated in the direction map at block level, where each block consists of 88 pixels. It is demonstrated that the search of directions pattern in specific area, generally called as pattern area, is able to classify fingerprints clearly and quickly. With our algorithm, the classification accuracy of 89% is achieved over 4000 images in the NIST-4 database, slightly lower than the conventional approaches. However, the classification speed is improved tremendously up to about 10 times as fast as conventional singular point approaches.
Parallel Processing Architecture Design for Two-Dimensional Image Processing Using Spatial Expansion of the Signal Flow Graph
Tsuyoshi ISSHIKI Yoshinori TAKEUCHI Hiroaki KUNIEDA

PAPER

Vol:
E76-A No:3
Page(s):
337-348
In this paper, a methodology for designing the architecture of the processor array for wide class of image processing algorithms is proposed. A concept of spatially expanding the SFG description which enables us to handle the problem as merely one-dimensional signal processing is used in constructing the methodology. Problem of I/O interface which is critical in real-time processing is also considered.
Narrow Fingerprint Sensor Verification with Template Updating Technique
SangWoo SIN Ru ZHOU Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-Algorithms and Data Structures

Vol:
E95-A No:1
Page(s):
346-353
A novel Template Updating system for fingerprint verification systems used in mobile applications is introduced in the paper. Based on the proposed method, the system performance is improved much more than the original one. Not only the FRR (False Reject Rate) but also the small overlap problem caused by the very narrow sensor on the mobile phone are solved. Based on the template updating system, templates are replaced with matched inputs towards a target structure which can expand the coverage of templates with large displacement and rotation. By using the test database, the system performance shows the FRR can be reduced by 79% in comparison with the one without template updating procedure. This system was adopted in practical mobile phones in the commercial market in 2009.
Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC
Mohammad ZALFANY URFIANTO Tsuyoshi ISSHIKI Arif ULLAH KHAN Dongju LI Hiroaki KUNIEDA

PAPER-VLSI Design Technology and CAD

Vol:
E91-A No:7
Page(s):
1748-1756
A simple extension used to assist the decomposition of task-level concurrency within C programs is presented in this paper. The concurrency decomposition is meant to be used as the point of entry for Multiprocessor System-on-Chips (MPSoC) architectures' design-flow. Our methodology allows the (re)use of readily available reference C programs and enables easy and rapid exploration for various alternatives of task partitioning strategies; a crucial task that greatly influences the overall quality of the designed MPSoC. A test case using a JPEG encoder application has been performed and the results are presented in this paper.
Dedicated Design of Motion Estimator with Bits Truncation Fast Algorithm
Li JIANG Dongju LI Shintaro HABA Chawalit HONSAWEK Hiroaki KUNIEDA

PAPER

Vol:
E81-A No:8
Page(s):
1667-1675
In this paper, a dedicated hardware design for motion estimation LSI of MPEG2 is presented. Combining our bits truncation adaptive pyramid (BTAP) algorithm with Window-MSPA architecture, the hardware cost is tremendously reduced without PSNR performance degradation for mean pyramid algorithm. The core of the test chip working at 83 MHz, performs a search range of 67 for image size of 1920 1152 and achieves video rate of 60 field/s. It can be used for HDTV purpose. The chip size is 4. 8 mm 4. 8 mm with 0. 5u 2-level metal CMOS technology. The result in this paper shows our promising future to realize one chip HDTV MPEG2 encoder.
A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer
Hao XIAO Tsuyoshi ISSHIKI Arif Ullah KHAN Dongju LI Hiroaki KUNIEDA Yuko NAKASE Sadahiro KIMURA

PAPER-Computer System

Vol:
E95-D No:8
Page(s):
2027-2038
Ultra-wideband (UWB) technology has attracted much attention recently due to its high data rate and low emission power. Its media access control (MAC) protocol, WiMedia MAC, promises a lot of facilities for high-speed and high-quality wireless communication. However, these benefits in turn involve a large amount of computational load, which challenges the traditional uniprocessor architecture based implementation method to provide the required performance. However, the constrained cost and power budget, on the other hand, makes using commercial multiprocessor solutions unrealistic. In this paper, a low-cost and energy-efficient multiprocessor system-on-chip (MPSoC), which tackles at once the aspects of system design, software migration and hardware architecture, is presented for the implementation of UWB MAC layer. Experimental results show that the proposed MPSoC, based on four simple RISC processors and shared-memory infrastructure, achieves up to 45% performance improvement and 65% power saving, but takes 15% less area than the uniprocessor implementation.
A Fingerprint Matching Using Minutia Ridge Shape for Low Cost Match-on-Card Systems
Andy SURYA RIKIN Dongju LI Tsuyoshi ISSHIKI Hiroaki KUNIEDA

PAPER-Digital Signal Processing

Vol:
E88-A No:5
Page(s):
1305-1312
In recent years, there is an increasing trend of using biometric identifiers for personal authentication. Encouraged by advances in smart card technologies, the fingerprint matching gets increasingly embedded into smart cards for an effective personal authentication method. However, current generation of low cost smart cards are usually equipped with limited hardware resources such as an 8-bit or 16-bit microcontroller. The fingerprint matching typically is a time consuming, computationally intensive and costly process. Therefore, it is still a challenge to integrate the fingerprint matching into a smart card. In this paper, we present a fast memory-efficient fingerprint matching using minutia ridge shape feature. This feature offers advantages of smaller template size, smaller memory requirement, faster matching time and robust matching against image distortion over conventional minutiae-based feature. The implementation result shows that the proposed method can be embedded in smart cards for a real-time Match-on-Card system.
New Rate Control Method with Minimum Skipped Frames for Very Low Delay in H.263+ Codec
Trio ADIONO Tsuyoshi ISSHIKI Chawalit HONSAWEK Kazuhito ITO Dongju LI Hiroaki KUNIEDA

PAPER-Image

Vol:
E85-A No:6
Page(s):
1396-1407
A new H.263+ rate control method that has very low encoder-decoder delay, small buffer and low computational complexity for hardware realization is proposed in this paper. This method focuses on producing low encoder-decoder delay in order to solve the lip synchronization problem. Low encoder-decoder delay is achieved by improving target bit rate achievement and reducing processing delay. The target bit rate achievement is improved by allocating an optimum frame encoding bits, and employing a new adaptive threshold of zero vector motion estimation. The processing delay is reduced by simplifying quantization parameter computation, applying a new non-zero coefficient distortion measure and utilizing previous frame information in current frame encoding. The simulation results indicate very large number skipped frames reduction in comparison with the test model TMN8. There were 80 skipped frames less than that of TMN8 within a 380 frame sequence during encoding of a very high movement video sequence. The 27 kbps target bit rate is achieved with insignificant difference for various types of video sequences. The simulation results also show that our method successfully allocates encoding bits, maintains small data at the encoder buffer and avoids buffer from overflow and underflow.
Modularization and Processor Placement for DSP Neo-Systolic Array
Kazuhito ITO Kesami HAGIWARA Takashi SHIMIZU Hiroaki KUNIEDA

PAPER

Vol:
E76-A No:3
Page(s):
349-361
A further study on a VLSI system compiler, named VEGA (VLSI Embodiment for General Algorithms), is presented. It maps a general digital signal processing algorithm onto a neo-systolic array, which is a VLSI oriented multiprocessor array. Highly complicated mapping problem is divided into subproblems such as modularization, operation grouping, processor placement, scheduling, control logic synthesis, and mask pattern generation. In this paper, the modularization technique is proposed which homogenizes all the operations of the processing algorithm to multiply-add operations. The processor placement algorithm to map processing algorithm onto a neo-systolic array so as to minimize data transfer time is also proposed.
The lmprovement in Performance-Driven Analog LSI Layout System LIBRA
Tomohiko OHTSUKA Nobuyuki KUROSAWA Hiroaki KUNIEDA

PAPER

Vol:
E76-A No:10
Page(s):
1626-1635
The paper presents the improvement of out new approach to optimize the process parameter variation, device heat and wire parasitics for analog LSI design by explicitly incorporating various performance estimations into objective functions for placement and routing. To minimize these objective functions, the placement by the simulated annealing method, and maze routing are effectively modified with the perfomance estimation. The improvement results in the excellent performance driven layout for the large size of analog LSIs.
Distributed Load Balancing Schemes for Parallel Video Encoding System
Zhaochen HUANG Yoshinori TAKEUCHI Hiroaki KUNIEDA

PAPER-Parallel/Multidimensional Signal Processing

Vol:
E77-A No:5
Page(s):
923-930
We present distributed load balancing mechanisms implemented on multiprocessor systems for real time video encoding, which dynamically equalize load amounts among PE's to cope with extensive computing requirements. The loosely coupled multiprocessor system, e.g. a torus connected one, is treated as the objective system. Two decentralized controlled load balancicg algorithms are proposed, and mathematical analyses are provided to obtain some insights of our decentralized controlled mechanisms. We also prove the proposed algorithms are steady and effective theoretically and experimentally.
A Clock and Data Recovery PLL for Variable Bit Rate NRZ Data Using Adaptive Phase Frequency Detector
Gijun IDEI Hiroaki KUNIEDA

PAPER

Vol:
E87-C No:6
Page(s):
956-963
An adaptive 4-state phase-frequency detector (PFD) for clock and data recovery (CDR) PLL of non return to zero (NRZ) data is presented. The PLL achieves false-lock free operation with rapid frequency-capture and wide bit-rate-capture range. The variable bit rate operation is achieved by adaptive delay control of data delay. Circuitry and overall architecture are described in detail. A z-Domain analysis is also presented.
HOG-Based Object Detection Processor Design Using ASIP Methodology
Shanlin XIAO Tsuyoshi ISSHIKI Dongju LI Hiroaki KUNIEDA

PAPER-VLSI Design Technology and CAD

Vol:
E100-A No:12
Page(s):
2972-2984
Object detection is an essential and expensive process in many computer vision systems. Standard off-the-shelf embedded processors are hard to achieve performance-power balance for implementation of object detection applications. In this work, we explore an Application Specific Instruction set Processor (ASIP) for object detection using Histogram of Oriented Gradients (HOG) feature. Algorithm simplifications are adopted to reduce memory bandwidth requirements and mathematical complexity without losing reliability. Also, parallel histogram generation and on-the-fly Support Vector Machine (SVM) calculation architecture are employed to reduce the necessary cycle counts. The HOG algorithm on the proposed ASIP was accelerated by a factor of 63x compared to the pure software implementation. The ASIP was synthesized for a standard 90nm CMOS library, with a silicon area of 1.31mm2 and 47.8mW power consumption at a 200MHz frequency. Our object detection processor can achieve 42 frames-per-second (fps) on VGA video. The evaluation and implementation results show that the proposed ASIP is both area-efficient and power-efficient while being competitive with commercial CPUs/DSPs. Furthermore, our ASIP exhibits comparable performance even with hard-wire designs.

21-40hit(45hit)

Author Search Result

[Author] Hiroaki KUNIEDA(45hit)

Two Dimensional Space Partition Recursive Filtering Algorithm on Rectangular Processor Array

Memory Sharing Processor Array (MSPA) Architecture

Design Optimization of VLSI Array Processor Architecture for Window Image Processing

Bits Truncation Adapteve Pyramid Algorithm for Motion Estimation of MPEG2

A New FPGA Architecture for High Performance Bit-Serial Pipeline Datapath

Narrow Fingerprint Template Synthesis by Clustering Minutiae Descriptors

Register-Based Process Virtual Machine Acceleration Using Hardware Extension with Hybrid Execution

Fast Fingerprint Classification Based on Direction Pattern

Parallel Processing Architecture Design for Two-Dimensional Image Processing Using Spatial Expansion of the Signal Flow Graph

Narrow Fingerprint Sensor Verification with Template Updating Technique

Decomposition of Task-Level Concurrency on C Programs Applied to the Design of Multiprocessor SoC

Dedicated Design of Motion Estimator with Bits Truncation Fast Algorithm

A Low-Cost and Energy-Efficient Multiprocessor System-on-Chip for UWB MAC Layer

A Fingerprint Matching Using Minutia Ridge Shape for Low Cost Match-on-Card Systems

New Rate Control Method with Minimum Skipped Frames for Very Low Delay in H.263+ Codec

Modularization and Processor Placement for DSP Neo-Systolic Array

The lmprovement in Performance-Driven Analog LSI Layout System LIBRA

Distributed Load Balancing Schemes for Parallel Video Encoding System

A Clock and Data Recovery PLL for Variable Bit Rate NRZ Data Using Adaptive Phase Frequency Detector

HOG-Based Object Detection Processor Design Using ASIP Methodology

Latest Issue

Links

Call for Papers

Submit to IEICE Trans.

Transactions NEWS

Popular articles